Nested collections and polytypism
نویسندگان
چکیده
A point free calculus of so called collection types is presented similar to the monadic calculus of Tannen Buneman and Wong We observe that our calculus is parametrised by a monad thus making the calculus polytypic A novel contribution of the paper is to discuss situations in which a single application involves more than one collection type In particular we outline the contribution to database research that may be obtained by exploiting current developments in polytypic programming Introduction and overview Collection types such as trees lists and bags have been studied extensively in computing science In particular in the research area of formal program development the observation attributed by L Meertens to H Boom that these types form a hierarchy has proved fruitful The most important aspect of this so called Boom hierarchy is that a calculus of higher order functions like map reduce and lter can be de ned on all its types This calculus is commonly known as the Bird Meertens Formalism BMF and it is widely used for the development and description of functional and parallel programs A generalisation of this calculus was found in the category theoretic and relational approaches to abstract data types and it was observed that the Boom hierarchy types form an instance of another popular category theoretic concept the monad This provided a new syntax and calculus for comprehensions on these types In the area of databases the interest in these types arises from the quest for query lan guages for databases containing structured data The traditional at relational database model only describes sets of tuples whose attributes are assumed to be of atomic type Now at Computing Laboratory University of Kent Canterbury Kent CT NF UK email E A Boiten ukc ac uk research was carried out at Eindhoven University of Technology Department of Mathematics and Computing Science Eindhoven University of Technology P O Box MB Eindhoven The Netherlands email paulh win tue nl the so called First Normal Form Various nested relational calculi i e calculi pro viding for set valued attributes have been proposed The most general of these is the monadic calculus described by Tannen Buneman and Wong The obser vation that collection types are monads with additional properties ringads can be attributed to Wadler and Trinder They prove that their calculus using the set monad is equivalent to the nested relational calculus The calculus we present here is inspired by and can be instantiated to that monadic calculus Thus this paper does not claim to directly advance research in database query languages Rather our intent is to present the state of the art in that area in such a way that it connects more easily with recent developments in formal program development This explains the di erences between our calculus and the monadic calculus our calculus is non extensional point free to facilitate equational reasoning the monad functor involved is an explicit parameter and instead of an underlying signature of basic functions we assume a category of partial functions with a few extra properties The importance of these di erences has to do with the emerging interest in the area of so called polytypic programs the current focus of research in the BMF Such programs are parametrised by type constructors as opposed to polymorphic programs which are parametrised by types Beginning with the work of Malcolm it has been observed that several programming concepts and building blocks can be pro tably formulated in polytypic terms thus enhancing their re usability The presentation of the elements of our calculus is such that it amounts to a constructive proof that it is in our setting the smallest orthogonal calculus that can describe the at relational calculus or in other words our calculus instantiated with the set monad is the extension of the at relational calculus with sets as rst class citizens The presentation is in two layers the rst layer identi es tuples and the basic operations on them in a category of partial functions with a special product The second layer lifts these operations to operations on sets of tuples Partiality of the basic operations requires some special attention in this lifting procedure In the nal section of this paper we describe issues for further research and their relation to research issues in formal program development Special products the tuple operations In relational database theory a tuple is either a function from a set of labels to a set of values or a member of a product type We choose the latter approach con dent that the strict categorical typing will provide the labels We work in a category of partial functions and write the typing of arrows in such a way that it looks natural for compositions if f A B and g B C then f g A C Sets are represented as identity arrows as usual In particular for each arrow f A B Any reference to relational calculus in this paper except for this footnote should be taken to refer to relational database calculus and the corresponding algebra rather than Tarski s calculus of binary relations even though that plays a crucial r ole in polytypic programming we assume the existence of an arrow f B B domain which equals the identity on f s domain and is unde ned elsewhere We assume a particular product whose unique mediating arrow written as f g is characterised by the following Axiom Product f g h f g A B h g f A B h assuming that f A C g B C with A B and A B the projections on A B We make product a bifunctor by setting f g f g We take to be a terminal object in the category with A the unique total arrow of type A In database theory tuples are elements of n ary products for arbitrary n To de ne unique n ary products we make the binary product associative with unit element This is axiomatised by equating the relevant isomorphisms to the appropriate identities Axiom assA B C def idA B C B C A B C idA B C Axiom A idA idA idA A Arrows can only be equal if their types are thus is associative on objects as well which justi es writing A B C without brackets above From the desire to address any eld of an n ary product directly for example the B eld in A B C with a single projection it follows that product should be commutative as well However if there are multiple elds of one and the same type it seems likely that we would wish to distinguish those So we introduce the following Axiom Semi commutativity Let swapA B A B A B the isomorphism be tween A B and B A If A and B are relatively prime then swapA B idA B Two types are relatively prime when their greatest common divisor is In order to be able to de ne division and greatest common divisors we complete our axiomisation of product by postulating Axiom All objects have a unique prime factorisation Now we can address any eld directly provided it occurs once the projection on B in A is B A B Since sets are represented by partial identity arrows it makes sense also to represent predicates for selection ltering in the same way In particular we assume the existence of equalA A A A A which is de ned exactly on those pairs a b of type A A such that a b As an abbreviation for the inverse of the duplicating function idA idA we use diagA A A equalA These give us the building blocks for projection and selection on the level of sets in the next section A nal basic operation is natural join of two tuples which is only de ned if they have matching values for elds of the same label type and in that case contains the combination of all their elds Using division we de ne it as joinA B idA C idB C diagC where C is the greatest common divisor of A and B and A C and B C are both required to be relatively prime with C The result type of joinA B is the least common multiple of A and B For A and B relatively prime the join equals the Cartesian product as one might expect Second layer lifting to sets Intuitively we would like to de ne the operators at the set level as fairly simple set com prehensions For example projecting a set S over the type A B on A is using for function application A B S f A B x j x Sg The map or apply to all operation on sets will do exactly this i e A B mapP A B where we index the map operation with the functor involved here it is P for powerset Also we have A B mapP A B which may di er from B A when A and B are not relatively prime However using mapP for comprehensions is not good enough in general For total operations like projections A B there is no problem However some of the other functions we would like to lift to the level of sets are partial for example predicates encoded as partial identity functions and that needs to be taken into account The lifted de nition of selection as a comprehension is assuming selectQ is the partial identity representing the predicate Q Q S fx j x S Def selectQ x g where Def is a meta predicate accounting for the partiality of selectQ The function mapP would produce the wrong result if we used it to lift such functions to the level of sets it would deliver functions that are unde ned whenever any of the elements in the set does not satisfy the predicate Using some more basic functions on sets we can resolve this problem The results of the partial function will be packed using the singleton set former unitP PA A we return the empty set zeroP PA B for values on which the function is unde ned and atten attenP PA PPA the resulting set of sets liftP f attenP mapP TotP f where TotP f x unitP f x if Def f x zeroP x otherwise It is easy to prove that this makes liftP the arrow part of a functor that coincides with mapP on objects and total functions liftP being a functor is a kind of healthiness criterion it means we can use equational reasoning on the level of function compositions for expressions involving liftP in particular distribution of functors over composition Using liftP we can de ne selection by Q liftP selectQ For join however we need to assume one more basic function The function liftP joinA B has type PC P A B for some type C but the natural join A B has type PC PA PB We need a transformation from PA PB to P A B now viz the crossproduct This will be de ned using the Actually P is also a functor viz from the base category into the corresponding Kleisli category F can for any F be expressed without applications and case distinctions if the base category is a boolean division allegory function that pairs all elements of a set with one particular value also known as the strength of the powerset functor De ned as a comprehension it is strP S x f y x j y Sg In general the strength of a functor F is a natural transformation from FA B to F A B which has interesting links with the concept ofmembership A related function de nable in terms of the strength is stlP x S f x y j y Sg stlP A B mapP swapB A strP B A swapA PB The crossproduct can then be computed by two applications of these functions one of each one nested At the level of sets it does not matter which one is chosen in which position since there is no order between the elements crossP attenP mapP stlP strP With this we can de ne join at the set level A B liftP joinA B crossP A B At this point all standard operators of relational algebra that operate elementwise have been lifted to sets Finally one wishes to have the normal set operations available as operations on databases Union is not de nable with the current set of primitive operations so we add a primitive unionP with one of its characteristic properties that it forms a monoid with zeroP Intersection can be de ned using cross product A liftP diagA crossP A A and set di erence is also expressible using equality in fact by encoding boolean false as the empty set Let notmem be the partial identity function which is only de ned on pairs of an element and a set such that the element is not in the set notmem diagPA zeroP A liftP diagA stlP A A
منابع مشابه
Applications of Polytypism in Theorem Proving
Polytypic functions have mainly been studied in the context of functional programming languages. In that setting, applications of polytypism include elegant treatments of polymorphic equality, prettyprinting, and the encoding and decoding of high-level datatypes to and from low-level binary formats. In this paper, we discuss how polytypism supports some aspects of theorem proving: automated ter...
متن کاملPolytypic Functions over Nested Datatypes (extended Abstract)
The theory and practice of polytypic programming is intimately connected with the initial algebra semantics of datatypes. This is both a blessing and a curse. It is a blessing because the underlying theory is beautiful and well developed. It is a curse because the initial algebra semantics is restricted to so-called regular datatypes. Recent work by R. Bird and L. Meertens 1] on the semantics o...
متن کاملPolytypic Functions Over Nested Datatypes
The theory and practice of polytypic programming is intimately connected with the initial algebra semantics of datatypes. This is both a blessing and a curse. It is a blessing because the underlying theory is beautiful and well developed. It is a curse because the initial algebra semantics is restricted to so-called regular datatypes. Recent work by R. Bird and L. Meertens [3] on the semantics ...
متن کاملMining Nested Association Patterns
We introduce the framework of mining association patterns from nested databases Two means to nest data items namely set and sequence are considered The term collection refers to a piece of data obtained by such nestings A natural binary relation de nes the generalization hierarchy among all collections A transaction database is a set of given collections called transactions The problem of minin...
متن کاملPolytypic Functions Over Nested
The theory and practice of polytypic programming is intimately connected with the initial algebra semantics of datatypes. This is both a blessing and a curse. It is a blessing because the underlying theory is beautiful and well developed. It is a curse because the initial algebra semantics is restricted to so-called regular datatypes. Recent work by R. Bird and L. Meertens (1998) on the semanti...
متن کاملAn Object-Oriented Approach to Nested Data Parallelism
This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called “collections” and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for “nested data parallelism.” Few ...
متن کامل